智能论文笔记

BSM loss: A superior way in modeling aleatory uncertainty of fine_grained classification

Shuang Ge , Kehong Yuan , Maokun Han , Desheng Sun , Huabin Zhang , Qiongyu Ye

分类：计算机视觉

2022-06-09

人工智能（AI）辅助方法在风险领域（例如疾病诊断）受到了很多关注。与疾病类型的分类不同，将医学图像归类为良性或恶性肿瘤是一项精细的任务。但是，大多数研究仅着重于提高诊断准确性，而忽略了模型可靠性的评估，从而限制了其临床应用。对于临床实践，校准对过度参数化的模型和固有的噪声极为明显地提出了低数据表格的主要挑战。特别是，我们发现建模与数据相关的不确定性更有利于置信度校准。与测试时间增强（TTA）相比，我们通过混合数据增强策略提出了一个修改后的自举损失（BS损耗）功能，可以更好地校准预测性不确定性并捕获数据分布转换而无需额外推断时间。我们的实验表明，与标准数据增强，深度集合和MC辍学相比，混合（BSM）模型的BS损失（BSM）模型可以将预期校准误差（ECE）减半。在BSM模型下，不确定性与相似性之间的相关性高达-0.4428。此外，BSM模型能够感知室外数据的语义距离，这表明在现实世界中的临床实践中潜力很高。

translated by 谷歌翻译

CholecTriplet2021: A benchmark challenge for surgical action triplet recognition

Chinedu Innocent Nwoye , Deepak Alapatt , Tong Yu , Armine Vardazaryan , Fangfang Xia , Zixuan Zhao , Tong Xia , Fucang Jia , Yuxuan Yang , Hao Wang

分类：计算机视觉

2022-04-10

Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.

translated by 谷歌翻译

3D-Transformer: Molecular Representation with Transformer in 3D Space

Fang Wu , Qiang Zhang , Dragomir Radev , Jiyu Cui , Wen Zhang , Huabin Xing , Ningyu Zhang , Huajun Chen

分类：机器学习

2021-10-04

3D空间中的空间结构对于确定分子特性是重要的。最近的论文使用几何深度学习来代表分子和预测性质。然而，这些论文在捕获输入原子的远程依赖性时在计算上昂贵;并且尚未考虑外部距离的不均匀性，因此未能学习不同尺度的上下文依赖表示。为了处理这些问题，我们引入了3D变压器，变压器的变型，用于结合3D空间信息的分子表示。 3D变压器在完全连接的图形上运行，在原子之间的直接连接。为了应对外部距离的不均匀性，我们开发了一种多尺度的自我关注模块，利用局部细粒度模式随着越来越多的上下文尺度来利用局部细粒度模式。由于不同尺寸的分子依赖于不同种类的空间特征，我们设计了一种自适应位置编码模块，用于针对小型和大分子采用不同的位置编码方法。最后，为了获得原子嵌入的分子表示，我们提出了一种殷勤最远的点采样算法，该算法在注意分数的帮助下选择一部分原子，克服虚拟节点的障碍和先前的距离 - 优势下采样方法。我们通过三个重要的科学域验证3D变压器：量子化学，物质科学和蛋白质组学。我们的实验表现出对晶体性能预测任务和蛋白质 - 配体结合亲和预测任务的最先进模型的显着改善，并且在量子化学分子数据集中显示了更好或更有竞争的性能。这项工作提供了明确的证据表明，生物化学任务可以从3D分子表示中获得一致的益处，不同的任务需要不同的位置编码方法。

translated by 谷歌翻译

Combating Uncertainty and Class Imbalance in Facial Expression Recognition

Jiaxiang Fan , Jian Zhou , Xiaoyu Deng , Huabin Wang , Liang Tao , Hon Keung Kwan

分类：计算机视觉

2022-12-15

Recognition of facial expression is a challenge when it comes to computer vision. The primary reasons are class imbalance due to data collection and uncertainty due to inherent noise such as fuzzy facial expressions and inconsistent labels. However, current research has focused either on the problem of class imbalance or on the problem of uncertainty, ignoring the intersection of how to address these two problems. Therefore, in this paper, we propose a framework based on Resnet and Attention to solve the above problems. We design weight for each class. Through the penalty mechanism, our model will pay more attention to the learning of small samples during training, and the resulting decrease in model accuracy can be improved by a Convolutional Block Attention Module (CBAM). Meanwhile, our backbone network will also learn an uncertain feature for each sample. By mixing uncertain features between samples, the model can better learn those features that can be used for classification, thus suppressing uncertainty. Experiments show that our method surpasses most basic methods in terms of accuracy on facial expression data sets (e.g., AffectNet, RAF-DB), and it also solves the problem of class imbalance well.

translated by 谷歌翻译

SEPT: Towards Scalable and Efficient Visual Pre-Training

Yiqi Lin , Huabin Zheng , Huaping Zhong , Jinjing Zhu , Weijia Li , Conghui He , Lin Wang

分类：计算机视觉

2022-12-11

Recently, the self-supervised pre-training paradigm has shown great potential in leveraging large-scale unlabeled data to improve downstream task performance. However, increasing the scale of unlabeled pre-training data in real-world scenarios requires prohibitive computational costs and faces the challenge of uncurated samples. To address these issues, we build a task-specific self-supervised pre-training framework from a data selection perspective based on a simple hypothesis that pre-training on the unlabeled samples with similar distribution to the target task can bring substantial performance gains. Buttressed by the hypothesis, we propose the first yet novel framework for Scalable and Efficient visual Pre-Training (SEPT) by introducing a retrieval pipeline for data selection. SEPT first leverage a self-supervised pre-trained model to extract the features of the entire unlabeled dataset for retrieval pipeline initialization. Then, for a specific target task, SEPT retrievals the most similar samples from the unlabeled dataset based on feature similarity for each target instance for pre-training. Finally, SEPT pre-trains the target model with the selected unlabeled samples in a self-supervised manner for target data finetuning. By decoupling the scale of pre-training and available upstream data for a target task, SEPT achieves high scalability of the upstream dataset and high efficiency of pre-training, resulting in high model architecture flexibility. Results on various downstream tasks demonstrate that SEPT can achieve competitive or even better performance compared with ImageNet pre-training while reducing the size of training samples by one magnitude without resorting to any extra annotations.

translated by 谷歌翻译

Task-adaptive Spatial-Temporal Video Sampler for Few-shot Action Recognition

Huabin Liu , Weixian Lv , John See , Weiyao Lin

分类：计算机视觉 | 人工智能

2022-07-20

几次动作识别中面临的主要挑战是培训视频数据不足。为了解决此问题，该领域中的当前方法主要集中于在功能级别上设计算法，而对处理输入视频数据的关注很少。此外，现有的框架采样策略可能会省略时间和空间维度的关键行动信息，从而进一步影响视频利用效率。在本文中，我们提出了一个新颖的视频框架采样器，以进行几次动作识别以解决此问题，其中特定于任务的空间框架采样是通过时间选择器（TS）和空间放大器（SA）实现的。具体而言，我们的采样器首先以较小的计算成本扫描整个视频，以获得对视频帧的全球感知。 TS在选择最显着，随后的贡献的顶级框架方面发挥了作用。 SA通过使用显着图的指导来扩大关键区域来强调每个框架的歧视性信息。我们进一步采用任务自适应学习，根据手头的情节任务动态调整采样策略。 TS和SA的实现均可以端到端的优化为基础，从而通过大多数少数发动的动作识别方法促进了我们所提出的采样器的无缝集成。广泛的实验表明，在包括长期视频在内的各种基准测试中的表演都有显着提高。

translated by 谷歌翻译

Attention Round for Post-Training Quantization

Huabin Diao , Gongyan Li , Shaoyun Xu , Yuexing Hao

分类：机器学习 | 人工智能

2022-07-07

目前，神经网络模型的量化方法主要分为训练后量化（PTQ）和量化意识培训（QAT）。培训后量化仅需要一小部分数据即可完成量化过程，但是其定量模型的性能不如量化意识培训。本文提出了一种新颖的量化方法，称为注意弹。该方法给出了参数w有机会映射到所有可能的量化值，而不仅仅是在量化过程中w附近的两个量化值。被映射到不同量化值的概率与量化值和W之间的距离负相关，并随高斯函数衰减。此外，本文使用有损耗的编码长度作为衡量标准，将位宽度分配给模型的不同层以解决混合精度量化的问题，从而有效避免了解决组合优化问题。本文还对不同模型进行了定量实验，结果证实了该方法的有效性。对于RESNET18和MOBILENETV2，本文提出的后培训量化仅需要1,024个培训数据和10分钟即可完成量化过程，这可以在量化意识培训的情况下实现量化性能。

translated by 谷歌翻译

A Robust Ensemble Model for Patasitic Egg Detection and Classification

Yuqi Wang , Zhiqiang He , Shenghui Huang , Huabin Du

分类：计算机视觉

2022-07-04

作为全球发病率的主要原因，肠道寄生虫感染仍然缺乏节省时间，高敏性和用户友好的检查方法。深度学习技术的发展揭示了其在生物形象中的广泛应用潜力。在本文中，我们应用了几个对象探测器，例如yolov5和变体cascadercnns，以自动区分显微镜图像中的寄生卵。通过专门设计的优化，包括原始数据增强，模型集合，传输学习和测试时间扩展，我们的模型在挑战数据集上实现了出色的性能。此外，我们的模型接受了增加的噪声训练，可以提高污染输入的较高鲁棒性，从而进一步扩大了其实践中的适用性。

translated by 谷歌翻译

A Distribution Evolutionary Algorithm for Graph Coloring

Yongjian Xu , Huabin Cheng , Yu Chen , Chengwang Xie

分类：神经与进化计算

2022-03-29

图色问题（GCP）是一个经典的组合优化问题，在理论研究和工程中具有广泛的应用。为了有效地解决复杂的GCP，提出了基于概率模型（DEA-PPM）的分布进化算法。基于概率模型的新颖表示，DEA-PPM采用高斯正交搜索策略来探索概率空间，可以使用较小的人群来实现全局探索。在局部剥削对少量解决方案人群中的协助下，DEA-PPM在勘探和剥削之间取得了良好的平衡。数值结果表明，DEA-PPM在选定的复杂GCP上表现良好，这有助于其对最先进的元启发术的竞争力。

translated by 谷歌翻译

ES-CRF: Embedded Superpixel CRF for Semantic Segmentation

Jie Zhu , Huabin Huang , Banghuai Li , Leye Wang

分类：计算机视觉

2021-12-14

现代语义分割方法非常重视调整特征表示，以改善各种方式的分割性能，例如度量学习，架构设计等。然而，几乎所有这些方法都忽略了边界像素的特殊性。由于CNN网络中的接收领域的连续扩展，这些像素容易获得来自两侧的困惑特征。通过这种方式，它们将误导模型优化方向并使往往共享许多相邻像素缺乏歧视的类别的类别重量，这将损害整体性能。在这项工作中，我们深入了解了这个问题，并提出了一种名为嵌入式超级棒（ES-CRF）的新方法来解决它。 ES-CRF涉及两个主要方面。一方面，ES-CRF创新地将CRF机制融合到CNN网络中作为有机整体，以实现更有效的端到端优化。它利用CRF引导在高级功能中通过像素之间的消息来净化边界像素的特征表示，并且在内像素属于同一对象的帮助下。另一方面，SuperPixel集成到ES-CRF中以在更可靠的消息传递之前利用本地对象。最后，我们的提出方法会产生关于两个具有挑战性的基准，即城市景观和ADE20K的新记录。此外，我们进行了详细的理论分析，以验证ES-CRF的优越性。

translated by 谷歌翻译